The Computer Learner Corpus: a Versatile New Source of Data for Sla Research
نویسنده
چکیده
Since making its first appearance in the 1960s, the computer corpus has infiltrated all fields of language-related research, from lexicography to literary criticism through artificial intelligence and language teaching. This widespread use of the computer corpus has led to the development of a new discipline which has come to be called 'corpus linguistics', a term which refers not just to a new computer-based methodology, but as Leech (1992: 106) puts it, to a 'new research enterprise', a new way of thinking about language, which is challenging some of our most deeply rooted ideas about language. With its focus on performance (rather than competence), description (rather than universals) and quantitative as well as qualitative analysis, it can be seen as contrasting sharply with the Chomskyan approach and indeed is presented as such by Leech (ibid.: 107). The two approaches are not mutually exclusive, however. Comparing the respective merits of corpus linguistics and what he ironically calls 'armchair linguistics', Fillmore (1992: 35) comes to the conclusion that 'the two kinds of linguists need each other. Or better, that the two kinds of linguists, wherever possible, should exist in the same body.' The computer plays a central role in corpus linguistics. A first major advantage of computerization is that it liberates language analysts 'from drudgery and empowers [them] to focus their creative energies on doing what machines cannot do' (Rundell and Stock 1992: 14). More fundamental, however, is the heuristic power of automated linguistic analysis, i.e. its power to uncover totally new facts about language. It is this aspect rather than 'the mirroring of intuitive categories of description' (Sinclair 1986: 202) that is the most novel and exciting contribution of corpus linguistics. English is undoubtedly the language which has been analysed most from a corpus linguistics perspective. Indeed the first computer corpus to be compiled was the Brown corpus, a corpus of American English. Since then English corpora have grown and diversified. At the time, the one million words contained in the Brown and the LOB were considered to be perfectly ample for research purposes, but they now appear microscopic in comparison to the 100 million words of the British National Corpus or the 200 million words of the Bank of English. This growth in corpus size over the years has been accompanied by a huge diversification of corpus types to cover a wide range of varieties: diachronic, stylistic (spoken vs. written; general …
منابع مشابه
Computer Learner Corpus Research: Current Status and Future Prospects
Despite a mere decade of existence, the field of computer learner corpus (CLC) research has been the focus of so much active international work that it seems worth taking a retrospective look at the research accomplished to date and considering the prospects for future research in both Second Language Acquisition (SLA) studies and Foreign Language Teaching (FLT) that emerge. One of the main dis...
متن کاملARIDA: An Arabic Interlanguage Database and Its Applications: A Pilot Study
This paper describes a pilot study in which we collected a small learner corpus of Arabic, developed a tagset for errorannotation of Arabic learner data, tagged the data for error, and performed simple Computer-aided Error Analysis (CEA). Language Learner Corpora and Applications Learner corpora research uses the methods and tools of Second Language Acquisition (SLA) studies and corpus linguist...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملOn the Automatic Analysis of Learner Language. Introduction to the Special Issue
Natural language processing (NLP) has long been used to automatically analyze language produced by language learners, typically aimed at providing individualized feedback and learner modeling in Intelligent Computer-Assisted Language Learning systems (cf. Heift & Schulze 2007). While much interesting research has been reported, it is difficult to determine the state of the art for the automatic...
متن کاملMetadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners
Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013